Yak: A High-Performance Big-Data-Friendly Garbage Collector

نویسندگان

  • Khanh Nguyen
  • Lu Fang
  • Guoqing Xu
  • Brian Demsky
  • Shan Lu
  • Sanazsadat Alamian
  • Onur Mutlu
چکیده

Most “Big Data” systems are written in managed languages, such as Java, C#, or Scala. These systems suffer from severe memory problems due to the massive volume of objects created to process input data. Allocating and deallocating a sea of data objects puts a severe strain on existing garbage collectors (GC), leading to high memory management overheads and reduced performance. This paper describes the design and implementation of Yak, a “Big Data” friendly garbage collector that provides high throughput and low latency for all JVM-based languages. Yak divides the managed heap into a control space (CS) and a data space (DS), based on the observation that a typical data-intensive system has a clear distinction between a control path and a data path. Objects created in the control path are allocated in the CS and subject to regular tracing GC. The lifetimes of objects in the data path often align with epochs creating them. They are thus allocated in the DS and subject to regionbased memory management. Our evaluation with three large systems shows very positive results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High-performance Garbage Collector for Standard ML

We have designed and implemented a new garbage collector for the Standard ML of New JerseySystem (SML/NJ). This collector has higher performance, lower latency and generally requires lessphysical memory than the existing SML/NJ collector. In addition, it is able to exploit the largesecondary caches found on modern workstations. This paper describes the design of the collector,an...

متن کامل

Mostly-Copying Collection: A Viable Alternative to Conservative Mark-Sweep

Many high-level language compilers generate C code and then invoke a C compiler to do code generation, register allocation, stack management, and low-level optimization. To date, most of these compilers link the resulting code against a conservative mark-sweep garbage collector in order to reclaim unused memory. We introduce a new collector, MCC, based on mostly-copying collection, and characte...

متن کامل

How Data Volume Affects Spark Based Data Analytics on a Scale-up Server

Sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark is gaining popularity for exhibiting superior scale-out performance on the commodity machines, the impact of data volume on the performance of Spark based data analytics in scale-up configuration is not...

متن کامل

Broom: Sweeping Out Garbage Collection from Big Data Systems

Many popular systems for processing “big data” are implemented in high-level programming languages with automatic memory management via garbage collection (GC). However, high object churn and large heap sizes put severe strain on the garbage collector. As a result, applications underperform significantly: GC increases the runtime of typical data processing tasks by up to 40%. We propose to use ...

متن کامل

Designing a Concurrent Hardware Garbage Collector for Small Embedded Systems

Today more and more functionality is packed into all kinds of embedded systems, making high-level languages, such as Java, increasingly attractive as implementation languages. However, certain aspects, essential to high-level languages are much harder to address in a low performance, small embedded system than on a desktop computer. One of these aspects is memory management with garbage collect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016